ENH: Different initialization methods for LoRA #1189

BenjaminBossan · 2023-11-27T17:14:55Z

This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers.

Description

The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass init_lora_weights='gaussian' to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r.

The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers).

In the future, similar extensions could be added for other adapter methods.

Notes

For testing, a rather simple test is added which calculates the Kolmogorov-Smirnov distance between the weights of a LoRA layer and the expected distribution. If someone has a better idea for a test, please let me know.

This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers. Description The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass `init_lora_weights='gaussian'` to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r. The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers). In the future, similar extensions could be added for other adapter methods. Notes For testing, a rather simple test is added which calculates the Kolmogorov-Smirnov distance between the weights of a LoRA layer and the expected distribution. If someone has a better idea for a test, please let me know.

HuggingFaceDocBuilderDev · 2023-11-27T17:23:06Z

The documentation is not available anymore as the PR was closed or merged.

sayakpaul · 2023-11-28T02:20:45Z

setup.py

@@ -18,7 +18,9 @@
 extras["quality"] = ["black ~= 22.0", "ruff>=0.0.241", "urllib3<=2.0.0"]
 extras["docs_specific"] = ["hf-doc-builder"]
 extras["dev"] = extras["quality"] + extras["docs_specific"]
-extras["test"] = extras["dev"] + ["pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0"]
+extras["test"] = extras["dev"] + [
+    "pytest", "pytest-cov", "pytest-xdist", "parameterized", "datasets", "diffusers<0.21.0", "scipy"


Why's the version restriction needed on the diffusers side?

It was pinned in #936 and can probably be unpinned now.

sayakpaul

Very nice! Particularly, the tests are very thorough!

How should this be reflected in huggingface/diffusers#5419?

BenjaminBossan · 2023-11-28T10:04:26Z

How should this be reflected in huggingface/diffusers#5419?

Hmm, not sure about that PR specifically. In general, for diffusers to use this feature, every time that a LoraConfig is being created, init_lora_weights="gaussian" should be passed as an additional argument. On top

diffusers needs to depend on a PEFT version that contains that feature, or
add the argument in a backwards compatible way, as older PEFT versions would error out when encountering this unknown argument

sayakpaul · 2023-11-28T10:23:39Z

Hmm the changes sound good to me. @patrickvonplaten WDYT?

pacman100

Thank you @BenjaminBossan for adding support for different weight initialization methods for LoRA. The tests are really good, never thought that I would see statistical significance analysis in tests. LGTM! 🤩

BenjaminBossan · 2023-11-28T13:41:56Z

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

pacman100 · 2023-11-28T13:55:12Z

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

Makes sense

sayakpaul · 2023-11-28T14:02:19Z

Before merging, should we wait until we have finalized the change required on diffusers side, in case we find that this PR doesn't quite cut it?

Do you mean testing it on huggingface/diffusers#5388? I can do that tomorrow.

sayakpaul · 2023-11-29T02:52:00Z

I can confirm that this works: https://wandb.ai/sayakpaul/dreambooth-lora/runs/bub8wjc3?workspace=user-sayakpaul

BenjaminBossan · 2023-11-29T11:37:19Z

Great, thanks for checking it.

This PR adds the possibility to use different initialization methods for LoRA, as is a requirement for a completely backwards compatible adoption of PEFT in diffusers. The default is still the same as always, namely the one from the reference implementation by Microsoft. On top of that, it is now possible to pass `init_lora_weights='gaussian'` to initialize the LoRA weights in the same way as is default for diffusers, namely with a normal distribution which is scaled by 1/r. The init method currently applies to LoRA linear and conv layers, but not embedding layers, which are always initialized from a normal distribution (and are probably irrelevant for diffusers). In the future, similar extensions could be added for other adapter methods.

BenjaminBossan added 3 commits November 27, 2023 18:08

Make style

c2dbb28

Future import for annotations

0fbe525

BenjaminBossan mentioned this pull request Nov 27, 2023

[PEFT] Adapt example scripts to use PEFT huggingface/diffusers#5388

Merged

BenjaminBossan requested review from pacman100, sayakpaul and patrickvonplaten November 27, 2023 17:20

Add scipy as a test dependency

60c0931

sayakpaul reviewed Nov 28, 2023

View reviewed changes

sayakpaul approved these changes Nov 28, 2023

View reviewed changes

pacman100 approved these changes Nov 28, 2023

View reviewed changes

BenjaminBossan merged commit f0fb951 into main Nov 29, 2023
14 checks passed

BenjaminBossan deleted the enh-init-methods-for-lora branch November 29, 2023 11:37

BenjaminBossan mentioned this pull request Nov 29, 2023

LoftQ: Add LoftQ method integrated into LoRA. Add example code for LoftQ usage. #1150

Merged

This was referenced Dec 4, 2023

Training is successfull, output image does not contain the person whose 10 images i used to train it #1158

Closed

There seems to be an error in LoRA's initialization #1060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Different initialization methods for LoRA #1189

ENH: Different initialization methods for LoRA #1189

BenjaminBossan commented Nov 27, 2023

HuggingFaceDocBuilderDev commented Nov 27, 2023 •

edited

Loading

sayakpaul Nov 28, 2023

BenjaminBossan Nov 28, 2023

sayakpaul left a comment

BenjaminBossan commented Nov 28, 2023

sayakpaul commented Nov 28, 2023

pacman100 left a comment

BenjaminBossan commented Nov 28, 2023

pacman100 commented Nov 28, 2023

sayakpaul commented Nov 28, 2023 •

edited

Loading

sayakpaul commented Nov 29, 2023

BenjaminBossan commented Nov 29, 2023

ENH: Different initialization methods for LoRA #1189

ENH: Different initialization methods for LoRA #1189

Conversation

BenjaminBossan commented Nov 27, 2023

Description

Notes

HuggingFaceDocBuilderDev commented Nov 27, 2023 • edited Loading

sayakpaul Nov 28, 2023

Choose a reason for hiding this comment

BenjaminBossan Nov 28, 2023

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Nov 28, 2023

sayakpaul commented Nov 28, 2023

pacman100 left a comment

Choose a reason for hiding this comment

BenjaminBossan commented Nov 28, 2023

pacman100 commented Nov 28, 2023

sayakpaul commented Nov 28, 2023 • edited Loading

sayakpaul commented Nov 29, 2023

BenjaminBossan commented Nov 29, 2023

HuggingFaceDocBuilderDev commented Nov 27, 2023 •

edited

Loading

sayakpaul commented Nov 28, 2023 •

edited

Loading